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ABSTRACT 

The human thymine-DNA glycosylase (TDG) initiates 
the base excision repair (BER) pathway to remove 
spontaneous and induced DNA base damage. It was 
first biochemically characterized for its ability to re- 
move T mispaired with G in CpG context. TDG is 
involved in the epigenetic regulation of gene ex- 
pressions by protecting CpG-rich promoters from de 
novo DNA methylation. Here we demonstrate that 
TDG initiates aberrant repair by excising T when it 
is paired with a damaged adenine residue in DNA 
duplex. TDG targets the non-damaged DNA strand 
and efficiently excises T opposite of hypoxanthine 
(Hx), 1,A/®-ethenoadenine, 7,8-dihydro-8-oxoadenine 
and abasic site in TpG/CpX context, where X is 
a modified residue. In vitro reconstitution of BER 
with duplex DNA containing Hx*T pair and TDG re- 
sults in incorporation of cytosine across Hx. Further- 
more, analysis of the mutation spectra inferred from 
single nucleotide polymorphisms in human popula- 
tion revealed a highly biased mutation pattern within 
CpG islands (CGIs), with enhanced mutation rate at 
CpA and TpG sites. These findings demonstrate that 
under experimental conditions used TDG catalyzes 
sequence context-dependent aberrant removal of 
thymine, which results in TpG, CpA^CpG mutations, 
thus providing a plausible mechanism for the puta- 
tive evolutionary origin of the CGIs in mammalian 
genomes. 



INTRODUCTION 

In mammals, post-replicative methylation of cytosine at the 
5-position (5mC) in DNA provides molecular basis of the 
epigenetic regulation of gene expression. DNA methyla- 
tion is essential for the organism development, cell differ- 
entiation, genomic imprinting and suppression of repeti- 
tive elements. The drawback of this mode of regulation is 
that spontaneous deamination of 5mC generates thymine, 
resulting in G»T mismatch which, if not repaired, leads 
to C^T transition mutations at CpG dinucleotides. In 
fact, it was proposed that low CpG content in mam- 
malian genomes is due to this high mutability of 5mC (1). 
In mammalian cells, both the mismatch-specific thymine- 
DNA glycosylase (TDG) and methyl-binding domain pro- 
tein 4 (MBD4/MED1) prevent mutagenic impact of 5mC 
deamination by excising thymine from G»T mispairs in 
CpG context which is then replaced by cytosine complet- 
ing base excision repair (BER) (2,3). In the BER pathway, 
a DNA glycosylase recognizes the abnormal base and cat- 
alyzes cleavage of the base-sugar bond, generating an abasic 
site, which in turn is repaired by an apurinic/apyrimidinic 
(AP) endonuclease (4,5). The human TDG and MBD4 
were first biochemically characterized for their ability to re- 
move T mispaired with G. A more detailed characterization 
showed that TDG exhibits a wide DNA substrate speci- 
ficity: it excises 3,7V^-ethenocytosine (eC) (6,7), thymine 
glycol (8), 5-hydroxycytosine (9), 7,8-dihydro-8-oxoadenine 
(8oxoA) (10), mismatched uracil (11) and its derivative with 
modifications at the C5 position (12). In contrast, MBD4 
has a narrow DNA substrate specificity, in addition to T 
excising uracil, 5-fluorouracil and 5-hydroxymethyluracil 
when these bases are opposite to a guanine in duplex DNA 
(3,13,14). Importantly, TDG is highly conserved in verte- 
brates (Supplementary Figure SI). 
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The TDG and MBD4 proteins have a very low turnover 
rate because of a high affinity for the AP site, gener- 
ated after base excision (15). Major human AP endonucle- 
ase 1 (APEl) can stimulate TDG-catalyzed DNA glycosy- 
lase activities by increasing the dissociation rate of TDG 
from AP site (16). A second factor facilitating TDG enzy- 
matic turnover is the SUMOylation of TDG by SUMO- 
1 and SUMO-2/3, which reduces drastically TDG's affin- 
ity to AP sites and increases its enzymatic turnover toward 
DNA substrates (17). TDG is also implicated in the reg- 
ulation of transcription of retinoic acid and estrogen re- 
ceptors, c-jun and thyroid transcription factor 1 (18,19). 
The TDG protein interacts with the transcriptional co- 
activator CBP/p300 and the resulting TDG/CBP/p300 
complex is competent for both BER and histone acetyla- 
tion (20). The TDG protein enhances the CBP/p300 tran- 
scriptional activity and in turn CBP/p300 acetylates TDG. 
Acetylation of TDG regulates the recruitment of APEl. 
Thus TDG-catalyzed repair and transcriptional activities 
are coupled via post-translational modifications (SUMOy- 
lation and acetylation) and by protein-protein interactions. 
Importantly, TDG expression is strictly cell-cycle regulated: 
it is present in cells throughout the G2-M and Gl phases, 
but rapidly disappears in the S phase (21). The presence 
of ectopically expressed TDG hinders S-phase progression 
and cell proliferation. 

Structural studies revealed that TDG binds to 23-28-mer 
DNA duplex in 2:1 ratio with a second molecule of protein 
generating the non-catalytic enzyme: substrate complex, this 
protein dimer complex cannot be formed on short 15-mer 
DNA duplex (22,23). In relation to these observations, it 
has been shown that human 0^-alkylguanine-DNA alkyl- 
transferase (AGT) and Escherichia coli mismatch-specific 
uracil-DNA glycosylase (MUG), a bacterial homolog of 
TDG, exhibit cooperative binding to DNA substrates and 
can form protein oligomers on DNA with 4-12 base peri- 
odicity (24,25). It was hypothesized that cooperative mode 
of binding could enable more efficient lesion search and/or 
protect DNA repair intermediates before holding them to 
downstream processing. However, the formation of dimeric 
complex on 28-mer DNA duplex had no measurable effect 
on the uracil-DNA glycosylase activity of TDG (22). 

Recent advances in understanding of the mechanisms 
of active DNA demethylation in mammals have identified 
the Ten-eleven translocation family of proteins (TETs) as 
5-methylcytosine (5mC) hydroxymethylases. TETs convert 
5mC to 5-hydroxymethylcytosine and then further oxidize 
it to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC), 
both in vitro and in vivo (26-29). TDG excises with high- 
efficiency 5fC and 5caC residues in CpG context suggest- 
ing a direct involvement of the TDG-initiated BER path- 
way in the active erasure of 5mC from the genome (28,30). 
Furthermore, it has been found that TDG knockout mice 
are embryonic lethal due to aberrant de novo DNA methy- 
lation of CpG islands (CGIs) promoters of developmen- 
tal genes, which results in failure to establish and/ or main- 
tain cell-type-specific gene expression programs during em- 
bryonic development (31,32). Vertebrate CGIs are short in- 
terspersed CG-rich unmethylated genomic regions that are 
present near the transcription start sites (TSSs) of genes 
(33). In mammahan genomes, CGIs are typically 500-3000 



base pairs in length and have been found in or near half 
of the promoters of mammahan genes (34). CGIs are key 
regulatory elements in transcription regulation, they are en- 
riched in permissive histone modifications, poor in DNA 
cytosine methylation and contain multiple sites for tran- 
scription factors (35). Despite low CpG content of mam- 
malian genomes, enrichment of CGIs at TSSs appeared 
early in evolution of vertebrates suggesting that associa- 
tion of CGIs with promoter regions was a consequence of 
warm-blooded vertebrate evolution (36). It was hypothe- 
sized that emergence and stabilization of CpG-rich context 
of CGIs could be due to hypodeamination regime associ- 
ated with low level of DNA methylation and/or GC-biased 
gene conversion — a non-reciprocal copying of a DNA se- 
quence from one homologous chromosome onto the other 
during meiotic recombination (37,38). Interestingly, the 
CGI-containing primate promoters exhibit the highest rate 
of divergence/mutation when compared with other distant 
mammahan species suggesting a heterotachy — accelerated 
evolution of primate promoters (39). At present, under- 
standing the molecular mechanisms underlying the CpG 
enrichment at transcriptional regulatory regions and emer- 
gence of CGIs in evolution require further investigations. 

In the BER pathway, DNA glycosylases specifically rec- 
ognize and excise modified DNA bases among the vast 
majority of regular bases. It is generally agreed that the 
main function of BER is to thwart the genotoxic effects 
of spontaneous and oxidative DNA base damage (5). In 
this respect, mismatches between two regular bases ow- 
ing to spontaneous deamination of 5mC to T and also 
to DNA polymerase errors during rephcation present a 
challenging puzzle to repair systems. To counteract these 
mutagenic threats to genome stability, cells evolved spe- 
cial mono-functional DNA glycosylases that can target 
non-damaged DNA strand to remove mismatched regu- 
lar bases. The sequence-specific E. coli Vsr endonuclease, 
the mismatch-specific adenine-DNA glycosylases {E. coli 
MutY and human MutY homologue (MYH)) and thymine- 
DNA glycosylases TDG/MBD4 recognize and remove reg- 
ular bases in mismatched DNA duplexes (40-42). Intrigu- 
ingly, it was shown that E. coli MutY can act in a muta- 
genic manner during DNA replication by excising regular 
A in the non-damaged template DNA strand opposite to 
mis-incorporated 8-oxoguanine residue consequently lead- 
ing to fixation of A»T^C»G mutation (43). This obser- 
vation raises the possibility of existence of specific DNA 
repair mechanisms that can introduce bias in spontaneous 
mutation spectra. Remarkably, certain mutations in the ac- 
tive sites of classic human uracil-DNA glycosylase (UDG) 
and TDG, which might occur in vivo, can result in dramatic 
change of their DNA substrate specificity (44,45). UDG- 
Y147A and UDG-N204D mutants excise regular thymine 
and cytosine residues in DNA, respectively, whereas TDG- 
A145G, TDG-H151A and double TDG-A145G-H151Q 
mutants exhibit non-specific activity to thymine in A»T 
base pair. These artificially engineered DNA glycosylases 
with aberrant activities can be highly cytotoxic and muta- 
genic in vivo (44). 

Here, we characterized the aberrant DNA glycosylase 
activities of wild-type MBD4 and TDG enzymes in vitro. 
Unexpectedly, we found that TDG can introduce T^C 
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mutation in the sequence context-dependent and a DNA 
replication-independent manner. Moreover, data obtained 
from analysis of single-nucleotide polymorphisms (SNPs) 
from human genome revealed very biased spectrum of 
spontaneous mutation in the CGIs. The role of TDG- 
catalyzed DNA repair activities in the evolution of CpG- 
rich regions in mammalian genomes is discussed. 

MATERIALS AND METHODS 

Chemicals, reagents and proteins 

Restriction enzymes and T4 DNA ligase were pur- 
chased from New England Biolabs (Evry, France). The 
E. coli BL21(DE3) cells were purchased from Novagen- 
EMD4Biosciences (Merck Chemicals, Nottingham, UK). 
Collection of the purified DNA glycosylases and AP en- 
donucleases was from the laboratory stock (46). The activi- 
ties of various DNA repair proteins were tested using their 
principal substrates and were checked just prior to use. 

Oligonucleotides 

Sequences of the oligonucleotide duplexes used in the 
present work are shown in Table 1 . All oligonucleotides con- 
taining modified bases and their complementary strands 
were purchased from Eurogentec (Seraing, Belgium) in- 
cluding the following: 40-mer d(AATTGCTATCTAGC 
TCCGCXCGCTGGTACCCATCTCATGA) where X is 
either hypoxanthine (Hx), 8oxoA, l,A^-ethenoadenine 
(eA), tetrahydrofuran (THE, an abasic site analog), 
7,8-dihydro-8-oxoguanine (SoxoG), eC, 5,6-dihydrouracil 
(DHU), alpha-2^-deoxyadenosine and complementary 40- 
mer d(TCATGAGATGGGTACCAGCGTGCGGAGCT 
AGATAGCAATT) where T is opposite to the lesion. The 
oligonucleotides were 5^-end labeled with [7-^^P]-adenosine 
triphosphate (ATP) (PerkinElmer, France) and then an- 
nealed with corresponding complementary strands as de- 
scribed previously (47). The resulting oligonucleotide du- 
plexes are referred to in the text as X»Y (NYN), where X is 
a residue in the pP]-labeled top strand, Y is a residue oppo- 
site to X in the complementary non-labeled bottom strand 
and N is a regular DNA base immediately neighboring the 
5^ and y sites of Y. nullnuU 

Expression and purification of TDG and MBD4 

The expression and purification of TDG, TDG^^^ and 
MBD4 proteins were performed as described previously 
(10,46). Briefly, Rosetta 2 (DE3) cells were transformed with 
the expression vectors pET28c-TDG (for full-length TDG 
protein), pET28c-TDG^^^ (for catalytic domain TDG pro- 
tein) and pET6H-MBD4 (for full-length MBD4 protein) 
and then grown at 37°C in Luria Broth (LB) medium, sup- 
plemented with 50 |jLg»ml~^ of kanamycin or ampicillin, on 
an orbital shaker to OD6oonm = 0.6-0.8. Then temperature 
was reduced to 30°C and the proteins expression was in- 
duced by 0.2 mM isopropyl p-D-l-thiogalactopyranoside 
and the cells were further grown either for 3 h for TDG 
and MBD4 inductions or 15 h for TDG^^^ induction. Bac- 
teria were harvested by centrifugation and cell pellets were 
lysed using a French press at 18 000 psi in buffer containing 



20 mM HEPES-KOH pH 7.6, 50 mM KCl supplemented 
with Complete^^ Protease Inhibitor Cocktail (Roche Di- 
agnostics, Switzerland). Ly sates were cleared by centrifuga- 
tion at 40 000 X g for 1 h at 4°C, the resulting supernatant 
was adjusted to 500 mM NaCl and 20 mM imidazole and 
loaded onto HiTrap Chelating HP column (Amersham Bio- 
sciences, GE Health). All purification procedures were car- 
ried out at 4°C. The column was washed with buffer A (20 
mM HEPES, 500 mM NaCl, 20 mM imidazole) and the 
bound proteins were eluted with a Hnear gradient of 20- 
500 mM imidazole in buffer A. Eluted fractions were ana- 
lyzed by sodium dodecyl sulphate-polyacrylamide gel elec- 
trophoresis (SDS-PAGE) and fractions containing the pure 
His-tagged TDG, TDG^^^ and MBD4 proteins were stored 
at — 80°C in 50% glycerol. The concentration of purified 
proteins was determined by the method of Bradford. 

Mammalian cell culture and protein extracts preparation 

Mouse embryonic fibroblasts (MEFs) WT and MEF 
Tdg~/~ cell lines were obtained as previously described 
(48). MEF-WT dind MEF-Tdg~^~ cells were maintained in 
Dulbecco's modified Eagle's medium (Invitrogen) supple- 
mented with 10% fetal calf serum, 100 U»ml~^ penicillin 
and 100 |jLg»ml~^ streptomycin at 37°C in the presence of 
5% CO2. Nuclear and cytosolic MEF extracts were pre- 
pared as previously described (49) with minor modifica- 
tions. All manipulations were carried out at 4°C. The cell 
pellets were washed twice in cold phosphate-buffered saline 
(PBS), followed by a wash in the hypotonic buffer contain- 
ing 250 mM sucrose supplemented with Complete^^ Pro- 
tease Inhibitor Cocktail (Roche Diagnostics, Switzerland). 
After pellets were re-suspended in the hypotonic buffer 
without sucrose and let to swell on ice for 10 min, these pel- 
lets were lysed by 35 strokes of a tight-fitting Dounce ho- 
mogenizer. The resulting lysates were centrifuged at 2000 x 
g for 5 min at 4°C (for collecting of nuclei) and the super- 
natants were further clarified by centrifugation at 1 5 000 x g 
for 20 min at 4°C. The supernatants (cytosolic extracts) were 
stored in aliquots at — 80°C. For nuclear extracts, nuclei pel- 
lets previously collected were suspended (v/v) in buffer con- 
taining 0.5 M NaCl and 0.2% NP-40. The nuclear suspen- 
sion was left for an additional 10 min at 4°C, after which 
the nuclei were spun down 20 min at 12 000 x g. The super- 
natants (nuclear extracts) were stored in aliquots at — 80°C. 

DNA repair activity 

The standard reaction mixture (20 |xl) for DNA repair as- 
says contained 5 nM of 5^-pP]-labeled duplex oligonu- 
cleotide, 20 mM Tris-HCl (pH 8.0), 100 mM NaCl, 1 
mM ethylenediaminetetraacetic acid (EDTA), 1 mM DTT, 
100 |xg»ml~^ bovine serum albumin (BSA) and 50 nM of 
TDG, TDG^^^ (truncated catalytic domain TDG protein) 
or MBD4 for 1 h at 37°C, unless otherwise stated. The stan- 
dard reaction mixture (50 |jl1) for repair assays in cell-free 
extracts contained 2.5 nM 5^-[^^P]-labeled oligonucleotide 
duplex in 50 mM KCl, 20 mM HEPES-KOH (pH 7.6), 0.1 
mg/ml BSA, 1 mM DTT, 1 mM EDTA and either 30 ixg cy- 
tosolic or nuclear protein extracts from MEFs, unless oth- 
erwise stated. The reaction mixtures were incubated at 37°C 
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Table 1. DNA sequence of the oligonucleotide duplexes used in the study' 



Name of duplex DNA sequence of oligonucleotide substrates 



G-34 


c 


TAT 


CCA 


CTA 


CTA 


TCC 


TCA 


TGA 


TCT 


ACT 


TCA 


ATC 








G 


ATA 


GGT 


GAT 


GAT 


AGG 


AGT 


ACT 


AGA 


TGA 


AGT 


TAG 






T«X(NXN) 


T 


CAT 


GAG 


ATG 


GGT 


ACC 


AGC 


NTN 


CGG 


AGC 


TAG 


ATA 


GCA 


ATT 




A 


GTA 


CTC 


TAC 


CCA 


TGG 


TCG 


NXN 


GCC 


TCG 


ATC 


TAT 


CGT 


TAA 


T«Y(CXC) 


T 


CAT 


GAG 


ATG 


GGT 


ACC 


AGC 


GTG 


CGG 


AGC 


TAG 


ATA 


GCA 


ATT 




A 


GTA 


CTC 


TAC 


CCA 


TGG 


TCG 


CYC 


GCC 


TCG 


ATC 


TAT 


CGT 


TAA 


Hx«C 


A 


ATT 


GCT 


ATC 


TAG 


CTC 


CGC 


XCG 


CTG 


GTA 


CCC 


ATC 


TCA 


TGA 




T 


TAA 


CGA 


TAG 


ATC 


GAG 


GCG 


CGC 


GAC 


CAT 


GGG 


TAG 


AGT 


ACT 


G»N 


A 


ATT 


GCT 


ATC 


TAG 


CTC 


CGC 


GCG 


CTG 


GTA 


CCC 


ATC 


TCA 


TGA 




T 


TAA 


CGA 


TAG 


ATC 


GAG 


GCG 


NGC 


GAC 


CAT 


GGG 


TAG 


AGT 


ACT 


T«G 


A 


ATT 


GCT 


ATC 


TAG 


CTC 


CGC 


TGG 


CTG 


GTA 


CCC 


ATC 


TCA 


TGA 




T 


TAA 


CGA 


TAG 


ATC 


GAG 


GCG 


GCC 


GAC 


CAT 


GGG 


TAG 


AGT 


ACT 


T«e A-mHa-ras 


G 


CAT 


GGC 


ACT 


ATA 


CTC 


TTC 


TTG 


ACC 


TGC 


TGT 


GTC 


TAA 


GAT 




c 


GTA 


CCG 


TGA 


TAT 


GAG 


AAG 


AXC 


TGG 


ACG 


ACA 


CAG 


ATT 


CTA 




c 


ACT 


GGA 


GTC 


TTC 


CAG 


TGT 


GAT 


GCT 


TGT 


GAG 


GAT 


GGG 


CCT 




G 


TGA 


CCT 


CAG 


AAG 


GTC 


ACA 


CTX 


CGA 


ACA 


CTC 


CTA 


CCC 


GGA 


TmZ 28 mer 


G 


TGT 


CAC 


CAC 


CGC 


TCA 


TGT 


ACA 


GAG 


CTG 












C 


ACA 


GTG 


GTG 


GCG 


AGT 


ZCA 


TGT 


CTC 


GAC 










8oxoA«N 28 mer 


G 


TGT 


CAC 


CAC 


CGC 


TCA 


NGT 


ACA 


GAG 


CTG 












C 


ACA 


GTG 


GTG 


GCG 


AGT 


XCA 


TGT 


CTC 


GAC 










U«G 28 mer 


G 


TGT 


CAC 


CAC 


CGC 


TCA 


UGT 


ACA 


GAG 


CTG 












C 


ACA 


GTG 


GTG 


GCG 


AGT 


GCA 


TGT 


CTC 


GAC 










TmZ 15 mer 


T 


CAT 


GTA 


CAG 


AGC 


TG 




















A 


GTZ 


CAT 


GTC 


TCG 


AC 


















8 0X0 A«N 15 mer 


T 


CAX 


GTA 


CAG 


AGC 


TG 




















A 


GTN 


CAT 


GTC 


TCG 


AC 


















U«G 15 mer 

AC 


T 


CAU 


GTA 


CAG 


AGC 


TG 




















A 


GTG 


CAT 


GTC 


TCG 





















^Following symbols are used to designate the modified and regular DNA bases: X is for 8oxoA and Hx; Y is for THF, eA, 2oxoA, 8oxoG, DHU, eC and 
aA; Z is for eA, Hx and G; N is for C, G, A and T. 



for 4 h when measuring TDG-specific activities (T»G, T»Hx 
and 8oxoA»G duplexes in which T- and 8oxoA-containing 
strands were [^^P]-labeled) or 100 min when measuring hu- 
man alkyl-A^-purine DNA glycosylase (ANPG) specific ac- 
tivities (Hx»T duplex in which Hx-containing strand was 
[^^P]-labeled). After incubation, the samples were treated 
either with 0.1 M NaOH for 3 min at 99°C and then neu- 
tralized by 0.1 M HQ or with light piperidine (10% (v/v) 
piperidine at 37°C for 40 min) in order to cleave at AP sites 
left after excision of damaged bases. To analyze reaction 
products, the samples were desalted using Sephadex G25 
column (Amersham Biosciences) equilibrated in 7.5 M urea 
and the cleavage fragments were separated by electrophore- 
sis in denaturing 20% (w/v) polyacrylamide gels (7-M Urea, 
0.5 X TBE, 42°C). The gels were exposed to a Fuji FLA- 
3000 Phosphor Screen, then scanned with Fuji FLA-3000 
and/or Typhoon FLA 9500 and quantified using Image 
Gauge V4.0 software. The release of T residue was mea- 
sured by the cleavage of the oligonucleotide containing a 
single T»X pair, where X is a residue opposite to T in the 
complementary strand. 



Single turnover kinetics 

Here we used single turnover kinetics under large excess of 
enzyme over substrate ([£]>> [S]>K(i) to obtain rate con- 
stants (kohs) that are not affected by enzyme-substrate asso- 
ciation or by product inhibition, such that kohs reflects the 
maximal base excision rate (/cobs ~ ^max)- The data were fit- 
ted by nonlinear regression to one-phase exponential asso- 



ciation [Equation (1)] using GraphPad Prism 5 software, 

[Fraction product] = v4(l — exp(— /cobsO) (1) 

where A is the amplitude, kohs is the rate constant and t is 
the reaction time (in minutes). 

The enzymatic assays were performed in large volume 
reaction mixture with 500 nM TDG and 50 nM duplex 
oligonucleotide for varying periods of time at 37°C. At each 
time point, 20 |jl1 of sample was withdrawn and treated by 
NaOH (0.1 M) for 3 min at 99°C, as previously described 
(10). Reaction products were analyzed by electrophoresis on 
denaturing 20% (w/v) polyacrylamide gels (7 M urea, 0.5 x 
TBE at 42° C) before quantification as described above. 

In vitro reconstitution of Hx#T repair by TDG 

5 nM [^^P]-labeled Hx»T duplex was incubated in the pres- 
ence of 5 nM APEl , 0. 1 units Polp and 2 units T4 DNA Lig- 
ase, 50 |jlM deoxyribonucleoside triphosphates (dNTPs), 
and either 300 nM TDG or 80 nM ANPG, in buffer con- 
taining 20 mM HEPES-KOH (pH 7.6), 100 mM NaCl, 0.1 
mgmm\-^ BSA, 1 mM DTT, 2 mM ATP and 5 mM MgCl2 
for 30 min at 37°C, and then reaction mixtures were incu- 
bated with 1 units of BstUI restriction enzyme for 40 min 
at 60° C. Reaction products were analyzed on denaturing 
PAGE as described above. 

Construction of circular DNA plasmid containing T#Hx pair 
within stop codon 

pT7Blue-3Rev-TGA and control pT7Blue-3Rev-CGA 
plasmids are derivatives of pT7Blue-3 Amp^, Kam^ 
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(Novagen, EMD Millipore, MA, USA), they contain 
reversed f 1 origin and either TGA stop codon or CGA-Arg 
codon, respectively, inserted after Met 19 of kanamycin 
resistance gene. Note that, pT7Blue-3 and its derivatives 
do not have any mammalian replication origins and cannot 
replicate in MEFs. The plasmid vectors were obtained by a 
polymerase chain reaction-based site-directed mutagenesis. 
Circular heteroduplex DNA substrate pT7Blue-3Rev- 
TGA-Hx containing Hx opposite to T within TGA 
codon (TpCpHx/TpGpA context) was constructed by 
primer extension, using 5^-phosphorylated pHx-Kan29 
d(pTCAGCATCTCHxCATGTTGGAATTTAATCG) 
oligonucleotide containing single Hx residue as a primer 
and single-stranded phagemid DNA as a template, as 
described previously (50,51). After synthesis of a second 
strand and ligation, the covalently closed circular heterodu- 
plex plasmid DNA was agarose gel purified using Qiagen 
MinElute Gel Extraction Kit (Qiagen, France). 

Transient pT7Blue-3Rev-TGA-Hx transfection assay of 
MEFs cells 

For in vitro assay, 6 ng pT7Blue-3Rev-TGA-Hx plasmid 
DNA was incubated or not in the presence of 500 nM 
TDG or 50 nM ANPG proteins under standard reaction 
conditions (30 min and 15 min at 37°C, respectively) and 
then electroporated into E. coli XL 1 -Blue competent cells 
(Stratagene, CA, USA), the transformants were selected on 
LB agar plates containing either kanamycin (Kam) or ampi- 
cillin (Amp). The mutation rates to Kam^ were calculated 
as the ratio of Kam^/Amp^ colonies, individual Kam^ 
clones were isolated and sequenced by GATC-Biotech (Ger- 
many) to characterize the mutation spectra. 

For ex vivo transient transfection assay, pT7Blue-3Rev- 
TGA-Hx plasmid DNA was transfected into MEF- WT^nd 
MEF-Tdg~/~ cells grown to 80% confluence, using ExGen 
500 reagent (Euromedex, Souffelweyersheim, France) ac- 
cording to the manufacturer's recommendations. Follow- 
ing 6 h after transfection, the cells were washed three times 
by PBS buffer and then treated with 1.75 U^ixl"^ DNase 
I (New England Biolabs, France) in buffer containing 10 
mM Tris-HCl, pH 7.6, 2.5 mM MgCl2, 0.5 mM CaCb for 
30 min at 37°C. After DNAse I treatment cells were washed 
three times by PBS buffer and harvested by centrifugation at 
5000 X g for 10 min at 4°C. The plasmid DNA was purified 
from cell pellets using QIAGEN Plasmid Mini Kit as rec- 
ommended by the manufacturer (Qiagen, France). Purified 
DNA was then electroporated into E. coli XL 1 -Blue com- 
petent cells and plated on LB agar plates as described above. 
The mutation rates measurement and sequencing were per- 
formed as described above. 

Bioinformatics 

TDG sequence data were downloaded from NCBI site. 
Multiple sequence alignment was performed using ClustalX 
2.1. Phylogenetic tree was created with PHYLIP 3.6. CGIs 
data and dbSNP build 127 were downloaded from the 
UCSC Genome Bioinfomatics site as plain text files. For 
SNP analysis, we developed special programs to parse in- 
put data and compute required statistics. 



RESULTS 

Human TDG excises thymine opposite to damaged adenine 
in oligonucleotide duplexes 

Since TDG exhibits a wider substrate specificity compared 
to MBD4, we decided to further explore DNA substrate 
specificity of the former. For this, a 5^-[^^P]-labeled 34- 
mer oligonucleotide containing a single G residue at po- 
sition 21 was hybridized to a complementary strand and 
then treated by chloroacetaldehyde (CAA), followed by in- 
cubation with APEl, mono-functional DNA glycosylases 
3-methyladenine-DNA glycosylase II (AlkA), MUG and 
MutY from E. coli and TDG and alkyl-7V-purine-DNA gly- 
cosylase (ANPG) from human. After reaction with DNA 
glycosylases samples were treated with alkali to reveal the 
presence of AP sites (Figure lA). It should be noted that 
we can observe cleavage of labeled DNA strand only, ac- 
tions of DNA repair enzymes on non-labeled complemen- 
tary strand cannot be detected by this approach. CAA is a 
carcinogenic compound that reacts with all DNA bases ex- 
cept thymine and generates eC, eA and A^^,3-ethenoguanine 
(eG) adducts (52). As expected from the previous studies 
(53), the E. coli AlkA protein cleaves CAA-DNA at G21 
position and generates 20-mer product indicating the pres- 
ence of eG (lane 8), whereas incubations with E. coli MUG 
and human ANPG resulted in DNA cleavage at C and A 
residues, respectively (lanes 9 and 11) indicating the pres- 
ence of eC and eA residues, respectively. Interestingly, the 
E. coli MutY protein excises G21 residue suggesting that 
it may recognize eG (lane 10). Unexpectedly, incubation of 
the CAA-DNA with TDG resulted in the formation of a 
single 19-mer cleavage fragment indicating the excision of 
T at position T20 which is located in TpG sequence con- 
text (lane 12). Importantly, no cleavage is observed at other 
thymine positions and also when untreated DNA was in- 
cubated with TDG (lane 4) suggesting that TDG recog- 
nizes T in the damaged DNA duplex, and only in the TpG 
context. Based on these observations, we hypothesized that 
CAA reacted with adenines in the non-labeled complemen- 
tary strand of 34-mer duplex and generated T»e A base pairs 
in which regular T was recognized by TDG as a mismatched 
base. 

Next, we suggested that other modifications of adenine 
in DNA duplex in the specific sequence context may also 
expose the non-damaged complementary thymine residues 
to TDG action. To examine these, we constructed several 
5^-[^^P]-labeled 40-mer oligonucleotide duplexes contain- 
ing single T»G, T»eA, T»Hx and T»8oxoA base pairs in 
TpG/CpX sequence context, where X is a modified ade- 
nine. The duplexes were incubated with the human TDG 
(full length), catalytic domain TDG (TDG^^^ amino acids 
1 1 1-308) and full-length MBD4 proteins. As expected, all 
three DNA glycosylases excise thymine in T»G duplex and 
generate a 20-mer cleavage product (Figure IB, lanes 2-4). 
In addition, TDG excises thymine opposite to eA, Hx and 
8oxoA residues, with the relative order of efficiency T»G > 
T»Hx > T»eA >> T»8oxoA (lanes 3, 7, 11 and 15). This re- 
sult confirms our finding that TDG excises T opposite to e A 
in CAA-treated DNA (Figure 1 A, lane 12). TDG^^^ exhibits 
the same DNA substrate preference as TDG but excises T 
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Figure 1. Action of mono-functional DNA glycosylases on oligonu- 
cleotide duplexes containing base lesions and mismatches. (A) Cleavage 
of duplex oligonucleotide containing ethenobases by various DNA glyco- 
sylases. 20 nM 5'-[^^P]-labeled 34-mer oligonucleotide duplex was treated 
or not with 1% CAA and then incubated in the presence of 100 nM DNA 
glycosylase or 10 nM APEl for 1 h at 37°C. Lane 1: control DNA; lane 
2: DNA cleaved at G20 position; lanes 3-6: as 1 but with enzymes; lane 
7: as 1 but 1% CAA; lanes 8-13: as 7 but with enzymes. (B) Excision of 
mismatched T in various oligonucleotide duplexes by full-length TDG, 
TDG'^^^ and MBD4. 5 nM 5'-[^2p]-labeled 40-mer oligonucleotide duplex 
containing T paired with G, eA, Hx and 8oxoA was incubated with 100 nM 
DNA glycosylase for 1 h at 30° C. After reaction, all samples were treated 
by Ught piperidine (10% (v/v) 40 min at 37° C) to cleave at AP sites. The re- 
action products were analyzed as described in the Materials and Methods 
section. 



opposite to modified A very weakly (lanes 2, 6, 10 and 14) 
suggesting that the N- and C-terminal portions of TDG 
are required for the efficient recognition of damaged T»A 
pairs. Interestingly, MBD4 can excise T in T»eA duplex al- 
beit with very low efficiency (lanes 4 and 8) but not in T»Hx 
and T»8oxoA duplexes (lanes 12 and 16). In an additional 
screen, we show that TDG can also excise T opposite to 
8oxoG, eC, DHU, alpha-anomeric 2^-deoxyadenosine and 
THF, a synthetic analog of AP site (Supplementary Figure 
S2). Taken together, these results show that TDG can recog- 
nize T in TpG/CpA* sequence context when complemen- 
tary A* is absent or undergoes chemical modifications. 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

Figure 2. Kinetics and sequence context dependences of TDG-catalyzed 
excision of mismatched T. (A) Graphic presentation of pre-steady-state sin- 
gle turnover kinetic of TDG-catalyzed cleavage of various oligonucleotide 
duplexes. Time kinetics were performed using 500 nM TDG and 50 nM 5'- 
[^2p]-labeled 40-mer oligonucleotide duplex (o) TmG, (•) T«Hx, (■) TmeA, 
(T) T«THF and (A) T«8oxoA. Each bar represents the mean values of 
TDG activity ± SD of three independent experiments. (B) Separation of 
TDG-cleavage products on denaturing PAGE. 5 nM 5'-[^^P]-labeled 40- 
mer T«Hx oligonucleotide duplexes, where T-containing strand is labeled 
and Hx placed in different sequence context: CXC, CXG, CXA, CXT, 
GXC, AXC, TXC, GXG, AXG and TXG, where X is Hx, were incubated 
with 50 nM TDG for 30 min at 37° C. Arrows 40 mer and 20 mer indicate 
substrate and cleavage products, respectively. The reaction products were 
analyzed as described in the Materials and Methods section. 



Kinetic parameters for the excision of T paired with damaged 
adenine residues in duplex DNA by TDG 

We further substantiated substrate specificity of TDG by 
measuring the cleavage rates of 40-mer TmG, T»Hx, TmeA, 
T»THF and T»8oxoA duplexes, where T is in TpG/CpX 
sequence context (where X is G or a damaged residue), un- 
der single-turnover conditions, using a molar excess of en- 
zyme over DNA substrate, which provides the maximal rate 
of base excision (/^obs) for a given substrate (Figure 2A and 
Table 2). Time course of the cleavage product generation 
shows that TmG is the most preferred substrate for TDG fol- 
lowed by T»Hx and then by TmeA, T»THF and T»8oxoA 
that are cleaved with lower efficiency (Figure 2A). Impor- 
tantly, TDG cleaves only 60% of T#Hx, 15% of T#eA, 10% 
of T»THF and 3% of T»8oxoA after 90 min of incubation, 
and the kohs values of TDG-catalyzed cleavage of TmHx, 
TmeA, T»THF and T»8oxoA are 2.5-, 4-, 6- and 28-fold 
lower than that of T»G (0.165 min~^) (Table 2), indicating 
that the human enzyme has lower affinity to the chemically 
modified, damaged DNA as compared to mismatched TmG 
duplex. Previously, it was shown that T»G-specific activity 
of TDG exhibits strong preference for 5^-TpG-37 5^-CpG-3^ 
context (54). Here, we observed the same sequence context 
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Figure 3. Kinetics and sequence context dependence of TDG-catalyzed 
excision of T opposite to eA. (A) Separation of TDG-cleavage products 
on denaturing PAGE. 5 nM 5'-[^^P]-labeled 40-mer T«eA oligonucleotide 
duplexes (T strand is labeled), where eA is placed in different sequence 
context: CXC, GXC, AXC, TXC, CXA and CXT where X is eA, were in- 
cubated with 50-nM TDG for 30 min at 37° C. Arrows 40 mer and 20 and 
21 mer indicate substrate and cleavage products, respectively. The reaction 
products were analyzed as described in the Materials and Methods sec- 
tion. (B) Graphic presentation of pre-steady-state single turnover kinetic 
of TDG-catalyzed cleavage of various oligonucleotide duplexes. Time ki- 
netics were performed using 500 nM TDG and 50 nM 5'-[^^P]-labeled 40- 
mer oligonucleotide duplex (o) T«G DL, (•) T«eA DL, (Q) T«eA-hp53 
and (■) T«e A-Ha. Each bar represents the mean values of TDG activity 
± SD of three independent experiments. 

preference for TDG when it excises T in T»Hx and T»eA 
duplexes. TDG excises T in 5^-TpG-375^-CpX-3^ (where X 
is Hx or eA) (Figure 2B, lanes 1, 3, 5, 7 and Figure 3 A, lane 
1) but fails or removes very weakly T in TpC, TpT and TpA 
contexts, respectively (Figure 2B, lanes 9, 11, 15, 17, 13, 19 
and Figure 3 A, lanes 3, 5 and 7). Interestingly, the nature 
of 5^-flanking base next to T has no significant influence on 
the cleavage efficiency of TDG (Figure 2B, lanes 1, 3, 5, 7 
and Supplementary Table SI). 

Importantly, the aberrant activity of TDG and MBD4 on 
eA-DNA shown above (Figures IB and 3 A) may have impli- 
cations in the observed hotspot mutations at adenine sites 
in the ras and p53 genes of tumors induced by chemical car- 
cinogens (55). Indeed, in mouse liver tumors induced by the 
urethane- and vinyl carbamate exposure, the Yi-ras gene is 
activated by CAK^CTK transversion mutation at codon 
61 with a higher frequency, compared to spontaneous tu- 
mors. In human cancers triggered by vinyl chloride expo- 
sure, the specific A^T transversions at codons 179 and 255 
of the p53 gene also occur in the CpA context (56). To ex- 
amine whether TDG is able to excise T opposite to eA in 
these mutational hotspot contexts, we constructed 40-mer 
duplex oligonucleotides T»eA-Ha and T»eA-p53 contain- 
ing eA within the mouse H-ra^ codon 61 and human p53 
codon 179 sequences (Table 3). As shown in Figure 3 A, 



TDG excises T opposite to eA in both codons (lanes 9 and 
11) suggesting that this aberrant repair may initiate error- 
prone translesion DNA synthesis across eA in mammalian 
cells. However, it should be stressed that the relative effi- 
ciency of TDG-catalyzed excision of T in T»eA-Ha and 
T»eA-p53 duplexes was much lower as compared to T»G 
duplex (Figure 3B), suggesting that this aberrant activity is 
minor relative to the other DNA repair functions of TDG, 
but nevertheless it may play a role under genotoxic stress. 

Role of cooperative DNA binding in TDG-catalyzed DNA 
glycosylase activities 

Previously, it has been shown that TDG and its bacterial ho- 
molog MUG bind to DNA substrates in a cooperative man- 
ner with a 2:1 stoichiometry. To examine a possible role of 
the TDG-dimer complex on its DNA glycosylase activities, 
we constructed 15-mer and 28 -mer DNA duplexes contain- 
ing T»eA, T»Hx, T»G, U»G and T»8oxoA base pairs posi- 
tioned within 5'-TpG-375'-CpX-3' (where X is G or dam- 
aged A) context. It should be noted that according to the 
crystallographic studies the 28-mer but not 15-mer duplex 
can accommodate 2:1 TDG-binding. As shown in Figure 4, 
TDG excises T opposite to eA, Hx and G in 28-mer duplex 
with good efficiency (lanes 1, 3 and 5) but it fails to excise 
T from the corresponding 15-mer duplexes (lanes 7, 9 and 
11). At the same time, TDG was able to excise U opposite 
to G in both 28- and 15-mer duplexes (Figure 4, lanes 13 
and 15). Taken together these results suggest that forma- 
tion of dimeric 2: 1 TDG-DNA complex is necessary for the 
excision of mismatched T in duplex DNA but not for the 
removal of U residues. 

Next, we have examined whether TDG-catalyzed exci- 
sion of 8oxoA residues depends on the size of DNA duplex. 
The full-length TDG protein excised 8oxoA opposite to T in 
40- and 28-mer duplexes but not in 15-mer duplex (Supple- 
mentary Figure S3A). At the same time, TDG and TDG*^^^ 
were able to remove 8oxoA paired with C and G in all three 
40-, 28- and 15-mer duplexes (Supplementary Figure S3A 
and B). Taken together, these results suggest that DNA sub- 
strate specificity of TDG toward 8oxoA and mismatched T 
varies strongly depending not only on the opposite base but 
also on the length of DNA duplex. The ability of TDG to 
excise both 8oxoA and T in 8oxoA»T duplex (Figure IB, 
lane 15 and Supplementary Figure S3 A) is quite puzzling 
since in the absence of other BER enzymes it may result in 
the formation of bi-stranded AP site cluster. 

In vitro reconstitution of the DNA glycosylases-initiated BER 
pathway of T#Hx pair in oligonucleotides and plasmid DNA 

One of the outcomes of the TDG-catalyzed removal of T 
opposite to a damaged A in duplex DNA would be a DNA 
polymerase synthesis across a damaged DNA template in 
the downstream step of BER pathway. Consequently, one 
would expect that the removal of T in T»Hx and T»8oxoA 
would direct DNA polymerase-catalyzed misincorporation 
of C across Hx or 8oxoA, followed by ligation and restora- 
tion of duplex DNA. Altogether this would result in the per- 
sistence of a lesion and appearance of T^C mutation in the 
non-damaged DNA strand. To examine this, we have recon- 
stituted in vitro the BER pathway for T»Hx pair positioned 
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Table 2. Pre-steady-state kinetic parameters of TDG-catalyzed excision of T opposite to various DNA adducts 



Substrate^ 



kohs (min 



Kb 



T.G (CXG) 
T.Hx (CXG) 
T.eA DL (CXC) 
T.THF (CXC) 
T«8oxoA (CXC) 



0.165 ±0.014 
0.066 ± 0.008 
0.042 ± 0.012 
0.027 ± 0.006 
0.006 ± 0.001 



^Letters within the parentheses represent nearest neighbor nucleotide sequence context where X is G, Hx, eA, THF or 8oxoA. 
^Constants were calculated by one-phase exponential association equation using GraphPad Prism 5. 



Table 3. The mutation spectrum in dinucleotide contexts in the human genome inferred from single nucleotide polymorphisms^ 



SNPs within a whole genome 



SNPs within CGIs 









Fraction 


Probability 


Probability 






Fraction 


Probability 


Probability of 


No. 


XX-YY^ 


Counts 


(%) 


of XX (%) 


ofYY (%) 


XX- YY 


Counts 


(%) 


of XX (%) 


YY (%) 


1. 


AT-GT 


4747845 


5.97 


7.73 


5.05 


CA-CG 


44879 


7.99 


5.60 


9.89 


2. 


AC-AT 


4728528 


5.95 


5.03 


7.73 


CG-TG 


44439 


7.91 


9.89 


5.60 


3. 


CG-TG 


4191791 


5.27 


0.99 


7.27 


CC-CT 


34022 


6.06 


12.43 


6.56 


4. 


CA-CG 


4189811 


5.27 


7.25 


0.99 


AG-GG 


33929 


6.04 


6.59 


12.46 


5. 


AA-AG 


3467707 


4.36 


9.78 


6.99 


AC-GC 


27555 


4.91 


4.26 


12.02 


6. 


CT-TT 


3466696 


4.36 


7.00 


9.80 


GC-GT 


26737 


4.76 


12.02 


4.26 


7. 


TA-TG 


3326902 


4.19 


6.57 


7.27 


CC-TC 


22113 


3.94 


12.43 


5.65 


8. 


CA-TA 


3310846 


4.17 


7.25 


6.57 


GA-GG 


22110 


3.94 


5.68 


12.46 


9. 


CC-CT 


3036893 


3.82 


5.21 


7.00 


CG-GG 


17020 


3.03 


9.89 


12.46 


10. 


AG-GG 


3033929 


3.82 


6.99 


5.21 


CC-CG 


16751 


2.98 


12.43 


9.89 



^Counts are shown in descending order and only for the first 10 SNPs. Full data set of all possible dinucleotide contexts can be found in Supplementary 
Table S3. 

^XX and YY represent dinucleotide sequence context. 
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Figure 4. Effect of DNA substrate length on TDG-catalyzed cleavage of T«G, T«Hx, T«eA and U«G oligonucleotide duplexes. 5 nM 3'-[a-^^P]cordycepin- 
labeled 28- and 15-mer duplexes were incubated with 50 nM TDG for 30 min at 37° C. After reaction, all samples were treated by 0.1 M NaOH for 3 min 
at 99° C to cleave at AP sites. The cleavage products were analyzed as described in the Materials and Methods section. Arrows indicate substrate (29 and 
16 mer) and cleavage products (12 mer). 



in 5'-TpG-375'-CpHx-3' context using the purified TDG, 
APEl, DNA polymerase (3 (POLp) and T4 DNA ligase 
proteins, dNTPs and 40-mer Hx»T duplex in which the Hx- 
containing strand was labeled with [^^P] at 5^ end (Figure 5, 
lanes 2-6). To detect Hx»T^Hx»C mutation, after recon- 
stitution reaction we probed DNA duplex with restriction 
endonuclease BstUI that recognizes the sequence CGC^G. 
As expected, BstUI cleaves Hx»T duplex only after reconsti- 
tution assay in the presence of BER enzymes and generates 
18-mer cleavage fragment indicating that the TDG-initiated 
aberrant BER leads to CGTG^CGCG mutation (lane 4). 



In control experiment, in vitro reconstitution of BER with 
the ANPG protein, a human DNA glycosylase that removes 
Hx, did not create BstUI restriction site (Supplementary 
Figure S4). Taken together, these results suggest that in vivo 
upon spontaneous deamination of A, TDG could introduce 
A»T^ GmC mutation predominantly in CpA and TpG con- 
texts in the absence of Hx repair and DNA replication. 

To further examine the repair of T»Hx duplex we devel- 
oped a plasmid DNA vector carrying selectable antibiotic- 
resistance markers for phenotypic selection in E. coli that 
can be used for in vitro and ex vivo mammalian cell- 
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Figure 5. In vitro reconstitution of the aberrant BER pathway using TDG 
and Hx«T duplex. 5 nM 5'-[^^P]-labeled ohgonucleotide duplex was incu- 
bated with 300-nM TDG, 5 nM APEl , 0. 1 unit Poip, 2 units T4 DNA Lig- 
ase and 50 |jlM of dNTPs for 30 min at 37° C. BstUI digestion was carried 
out at 60° C for 40 min. Lane 1 : T«Hx duplex in which T-containing strand 
is 5'-[^^P]-labeled incubated with TDG and then treated by 0.1 M NaOH 
to cleave at AP sites; lanes 2-8: Hx«T and Hx«C duplexes in which Hx- 
containing strand is 5'-[^^P]-labeled; lanes 9 and 10: G«T duplex in which 
G-containing strand is 5'-[^^P]-labeled. The reaction products were ana- 
lyzed as described in the Materials and Methods section. Arrows indicate 
TDG- and BstUI-catalyzed cleavage products. 



based transient transfection assays. The pT7Blue-3Rev- 
TGA-Hx vector, carrying T»Hx pair within an artificial 
stop codon TGA (5^-TpGpA-375^-TpCpHx-30 inserted 
into the kanamycin-resistance (Km^)-encoding gene 57 bp 
downstream from the start codon, was used to detect T^C 
mutation by phenotypic screening of E. coli transformants. 
The TDG-initiated repair of T»Hx pair in pT7Blue-3Rev- 
TGA-Hx should result in an increase of the ratio of Km^ 
clones to total ampicilhn-resistance (Amp^) transformants 
due to mutation TGA-^ CGA in the stop codon, whereas 
ANPG action should result in a decrease of the ratio of 
Km^ clones due to removal of Hx and restoration of the 
stop codon. For in vitro testing, the pT7Blue-3Rev-TGA- 
Hx vector was transformed into E. coli either directly or 
after pre-treatment with the purified TDG or ANPG pro- 
teins. As shown in Figure 6, pre-treatment of the plasmid 
DNA with TDG and ANPG resulted in 12.6-fold increase 
and 43 -fold decrease, respectively, in the relative frequency 
of Km^ E. coli transformants as compared to that of the 
control non-treated plasmid DNA (see also Supplementary 
Table S2). These results are in agreement with data ob- 
tained in the in vitro reconstitution assay using T»Hx duplex 
oligonucleotide (Figure 5) and indicate that TDG induces 
T^C mutations whereas ANPG prevents the mutagenic 
effect of Hx in E. coli. For ex vivo testing, we transfected 
the pT7Blue-3Rev-TGA-Hx vector into MEF cell lines ei- 
ther proficient {M^¥-WT) or deficient {M^¥-Tdg-/-) for 
TDG. After 6 h incubation, transfected plasmid DNA was 
recovered from MEFs and transformed into E. coli, and the 




The plasmid pT7Blue-3Rev-TGA-Hx 

Figure 6. The relative reversion Kam^ Kam^ rates of the plasmid DNA 
treated with the purified DNA glycosylases and/or transiently transfected 
to MEFs. (A) The fold change in the Kam^^Kam^ reversion rates be- 
tween pT7Blue-3Rev-TGA-Hx plasmid DNA treated and non-treated 
with ANPG. (B) The fold change in the Kam^^Kam^ reversion rates 
between pT7Blue-3Rev-TGA-Hx plasmid DNA treated and non-treated 
with TDG. (C) The fold change in the Kam^^Kam^ reversion rates be- 
tween pT7Blue-3Rev-TGA-Hx plasmid DNA transiently transfected to 
MEF- ^^^r and ME¥-Tdg~/~ cells. For details see Supplementary Tables 
S2 and S3. 



ratio of Km^/Amp^ transformants was determined. The 
results showed that the plasmid DNA from MEF- WT cells 
yields 1 .7-fold higher frequency of Km^ clones as compared 
to that of the plasmid from MEF-Tdg~f~ cells (Figure 6 
and Supplementary Table S3). DNA sequencing of the plas- 
mids from Km^ clones from in vitro and ex vivo experiments 
confirmed the presence of TGA^CGA mutation. How- 
ever, when the plasmid DNA isolated from MEF- WT cells 
was treated with the purified ANPG protein, the yield of 
Km^ is decreased 7-fold as compared to that of the plasmid 
DNA not treated with ANPG (Supplementary Table S3). 
These results suggest that the pT7Blue-3Rev-TGA-Hx plas- 
mid DNA recovered 6 h after transfection into MEF- WT 
cells still contains unrepaired Hx residues and that a small 
difference in the relative frequencies of Km^ clones between 
MEF-H^rand M^V-Tdg-'- cells could also be attributed 
to the presence of unrepaired plasmid molecules. Taken to- 
gether, the results obtained using the transient transfection 
assay are not conclusive and other approaches are required 
to address a possible role of the aberrant repair function of 
TDG in vivo. 

Repair activities on oligonucleotide duplexes containing T#G, 
T#Hx and 8oxoA#G pairs in mouse cell-free extracts 

Based on the above observations, we conclude that T»Hx 
pair in 5^-TpG-375^-CpHx-3^ context will be a target for 
at least two human enzymes TDG and ANPG. Therefore, 
in vivo the order of action of these two enzymes on T»Hx 
duplex will determine the mutagenic outcome of the BER 
pathway. To examine this, we measured thymine- and Hx- 
DNA glycosylase activities in nuclear and cytosolic cell- 
free extracts prepared from WT and Tdg~/~ MEFs using 
S'-[^^V]-\?ihQ\Qd 40-mer T»G, T»Hx, Hx»T and 8oxoA»G 
oligonucleotide duplexes as DNA substrates. It should be 
noted that in these duplexes only the upper T-, T-, Hx-, and 
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Figure 7. DNA glycosylase activities in cytosolic and nuclear protein ex- 
tracts from MEF-^rr and MEF-Tdg-/- cells. Denaturing PAGE anal- 
ysis of the cleavage products after incubation of 5'-[^^P]-labeled 40-mer 
oligonucleotide duplexes containing T«G, T«Hx, Hx«T and 8oxoA«G 
base pairs with cell-free extracts. The DNA repair assay (50 |jl1) was per- 
formed in BER+EDTA buffer containing 2.5 nM 5'-[^2p]-labeled duplex, 
50 mM KCl, 20 mM HEPES-KOH (pH 7.6), 0.1 mg.ml-^ BSA, 1 mM 
DTT, 1 mM EDTA and either 30 |jLg of cytosolic or nuclear proteins ex- 
tracts from MEFs. Reactions mixtures were incubated 4 h or 100 min 
at 37° C to measure TDG or ANPG-catalyzed activities, respectively. (A) 
DNA glycosylase activities on Hx and mismatched T residues in the cell- 
free extracts. Lanes 1-5: T«G duplex in which T-containing strand is 5'- 
[^^P]-labeled and G opposite to T in complementary strand is in the se- 
quence context CGG; Lanes 6-10: T«Hx duplex in which T-strand is la- 
beled and Hx opposite to T is in the context CXC where X is Hx; Lanes 
11-15: Hx«T duplex in which Hx strand is labeled and T opposite to 
Hx is in the context GTG. (B) DNA glycosylase activities on T, Hx and 
8oxoA residues in the cell-free extracts. Lanes 1-5: T«Hx duplex in which 
T-containing strand is labeled and Hx opposite to T is in the context GXC 
where X is Hx; Lanes 6-10: Hx«T duplex in which Hx strand is labeled 
and T opposite to Hx is in the context GTC; Lanes 1 1-15: 8oxoA«G du- 
plex in which SoxoA strand is labeled and G opposite to 8oxoA is in the 
context GGG. Arrows denote the 40-mer DNA substrate and 20- and 19- 
mer cleavage products. For details see the Materials and Methods section. 



8oxoA-containing DNA strands were labeled, respectively. 
In addition, the 40-mer T»Hx and Hx»T duplexes have the 
same sequence and contain a single Hx residue placed ei- 
ther in CXC or GXC sequence context, where X is Hx. 
As shown in Figure 7A, incubation of 40-mer TmG duplex 
in cytosolic and nuclear extracts from MEF- WT generated 
19-mer cleavage fragment indicating the presence of a ro- 
bust mismatch-specific thymine-DNA glycosylase activity 
(lanes 2 and 3). To note, the nuclear extracts from MEF- 
WT exhibited higher cleavage efficiency (lane 2) as com- 
pared to that of the cytosoHc extracts from the same cells 



(lane 3) suggesting preferential nuclear distribution of TDG 
in MEFs. As expected, both cytosohc and nuclear extracts 
from MEF-Tdg~^~ completely lack thymine-DNA glycosy- 
lase activity (lanes 4 and 5) indicating the absence of TDG 
in these cells. Incubation of the 40-mer T»Hx duplex, in 
which T-containing strand is 5^-[^^P]-labeled, with extracts 
from MEF- WT generated 20-mer cleavage fragment (lanes 
7 and 8), with much lower efficiency as compared to T»G 
duplex, indicating excision of T at the position 21 opposite 
to Hx. Interestingly, the cytosolic extracts from MEF- WT 
exhibited higher activity on T»Hx duplex (lane 7) as com- 
pared to nuclear extracts (lane 8), which is opposite to the 
cleavage pattern observed on T»G duplex (lanes 2 and 3). 
Importantly, no cleavage of the 40-mer T»Hx duplex was 
observed in the both cytosolic and nuclear extracts from 
MEF -Tdg~^~ (lanes 9 and 10) indicating that TDG is a ma- 
jor DNA glycosylase that excises T opposite to Hx in mam- 
malian cells. Next, we examined Hx-DNA glycosylase ac- 
tivity in MEF extracts using the 40-mer Hx»T duplex that 
has the same sequence as T»Hx one and in which the Hx- 
containing strand is 5^-[^^P]-labeled. Nuclear extracts from 
both WT and Tdg-^- MEF exhibited highly efficient Hx- 
DNA glycosylase activity that cleaves most of the Hx»T du- 
plex and generates 19-mer product (lanes 13 and 15). In con- 
trast, cytosolic extracts from MEF exhibited much weaker 
Hx-DNA glycosylase activity (lanes 13 and 15) suggesting 
that the mouse ANPG protein is mainly localized in nucleus 
of MEFs, with very little cytoplasmic distribution. Taken 
together, these results indicate that mammalian TDG can 
catalyze aberrant removal of T in T»Hx duplex in the cell- 
free extracts although to a much lesser extent as compared 
to its damage- specific DNA glycosylase activities on TmG 
and 8oxoA»G substrates. Furthermore, TDG and ANPG 
exhibit antagonistic DNA glycosylase activities on T»Hx 
duplex: in the nuclear extracts highly efficient excision of 
Hx by ANPG inhibits thymine-DNA glycosylase activity of 
TDG (lanes 13 and 8) whereas in the cytosolic extracts weak 
Hx-activity of ANPG stimulates TDG-catalyzed aberrant 
removal of T opposite to Hx (lanes 12 and 7). 

Next, we have examined whether TDG also exhibits se- 
quence context preference in the cell-free extracts. As shown 
in Figure 7B, no thymine-DNA glycosylase activity was ob- 
served in extracts from MEF- WT on T»Hx (GXC) duplex, 
in which T is in GpTpC/GpXpC context where X is Hx 
(lanes 7 and 8) indicating that TDG in cell-free extracts also 
exhibits strong preference for 5^-TpG-375^-CpX-3^ context 
where X is a damaged adenine residue. This result is in 
agreement with data obtained with the purified TDG pro- 
tein (Figure 2B, lane 9). Importantly, the nuclear extracts 
from MEF exhibited efficient Hx-DNA glycosylase activity 
on Hx»T (GXC) duplex, in which the Hx-containing strand 
is 5^-pP]-labeled (Figure 7B, lanes 13 and 15), indicating 
that activity of the mouse ANPG is not dependent on se- 
quence context. In addition, we measured TDG-catalyzed 
80X0A-DNA glycosylase activity in extracts from MEFs us- 
ing 8oxoA»G duplex, in which the 8oxoA-containing strand 
is 5^-pP]-labeled (Figure 7B, lanes 11-15). Again, the nu- 
clear extracts exhibited higher activity as compared to the 
cytosolic ones (lane 13 versus 12) from MEF-PFTand no 
8 0x0 A activity was observed in the extracts from MEF- 
Tdg~/~ . These results confirm that TDG is a major T- and 
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SoxoA-DNA glycosylase in mammalian cells and a prefer- 
ential nuclear distribution of TDG in MEFs. 

Comparison of SNPs in whole genome and CGIs 

Based on our biochemical data, we propose that in vivo 
TDG may promote mutagenic conversion of both CpA and 
TpG dinucleotides to CpG ones. In chromosomes TDG 
is mainly localized in CGIs of promoter regions to pro- 
tect them from cytosine methylation by DNMT3a and 
DNMT3b de novo methylases, therefore CGIs may exhibit 
mutational bias for TpG, CpA^CpG mutations. To exam- 
ine this we analyzed the mutation spectra in human genome 
inferred from the SNP data (NCBI dbSNP build 127 for hu- 
man). For this, we measured the relative frequencies of base 
substitutions within dinucleotide contexts. Remarkably, the 
relative frequencies of CpA^CpG (7.99% of total SNPs) 
and CpG^TpG (7.91%) in CGIs of human genome were 
the highest among other types of SNPs suggesting that TpG 
and CpA dinucleotides are mutation hotspots in CpG-rich 
regions (Table 3 and Supplementary Table S4). At the same 
time, the highest frequencies of SNPs for the whole genome 
were ApT^GpT (5.97% of total SNPs) and ApC^ApT 
(5.95%), followed by TpG and CpA (5.27% each), indi- 
cating that the genome-wide mutation spectra are differ- 
ent from that of CGIs (Table 3 and Supplementary Table 
S4). Importantly, the probability of occurrence of CpA and 
CpG dinucleotides in CGIs is 5.6% and 9.89%, whereas in 
a whole genome it is the opposite, 7.25% and 0.99%, re- 
spectively. These dramatic imbalances in the frequencies of 
dinucleotides occurrence between whole genome and CGIs 
might be due to directionality of spontaneous mutagenesis 
in vivo\ at the genome- wide level CpG^TpG/CpA muta- 
tions are more frequent than the reverse mutations because 
of the specific pattern of DNA methylation, whereas in the 
CGIs, this is in fact the opposite, as TpG and CpA mutate 
more frequently to CpG, possibly due to TDG-catalyzed 
aberrant BER. 

DISCUSSION 

Here, we report that the TDG protein is able to excise T 
opposite to various adenine lesions with a good efficiency 
when it placed in the specific 5^-TpG-375^-CpX-3^ (where 
X is Hx, eA, AP site or 8oxoA) context. Importantly, under 
our experimental conditions we do not observe TDG activ- 
ity toward non-damaged A»T duplex. Also TDG-catalyzed 
excision of T strongly depends on the TpG dinucleotide 
context, as the enzyme was not at all or only weakly active 
in the TpA, TpC and TpT contexts (Figures 2B and 3A). 
However, in the present study, we did not perform an ex- 
haustive search for all possible sequence contexts and DNA 
lesions, hence one cannot exclude a possibility of the pres- 
ence of a weak TDG activity on certain non-TpG contexts. 
Interestingly, MBD4, a functional human counterpart of 
TDG, is also able to remove T opposite to eA residue but 
not across other types of adenine damage (Figure IB and 
Supplementary Figure S2). This aberrant excision of non- 
damaged T could lead to translesion repair synthesis across 
eA adduct and introduce A»T^T»A transversions that 
were typically observed previously in the plasmid transfec- 



tion experiments and in human liver angiosarcomas asso- 
ciated with exposure to vinyl chloride (55,57). Intriguingly, 
the carcinogen-induced A^T hotspot mutations occurred 
at CpA sites in codon 61 of the c-Ha-ra^ gene and codons 
179 and 255 of the p53 gene suggesting a possible involve- 
ment of MBD4 and TDG in the vinyl chloride-induced tu- 
mors (55). The aberrant BER repair initiated by thymine- 
DNA glycosylases on eA»T and 8oxoA»T duplexes may 
explain the higher mutagenic potentials of exocyclic DNA 
adduct and oxidized adenine residues in mammahan cells, 
compared to E. coli, as described in the previous studies 
(57,58). 

The ability of TDG to excise T opposite to 8oxoA, eA 
and/ or AP site with significant efficiency is quite unex- 
pected because of the requirement of G in complementary 
strand for excision of both U and T residues by TDG. In- 
deed, crystal structures of TDG^^^ in complex with various 
DNA substrates containing either G or A opposite to the 
lesion showed that G, but not A, is contacted by the back- 
bone oxygens of amino acids Ala274 and Pro280 (14,23,45). 
These specific interactions explain the preference of TDG 
for G over non-damaged A in the complementary strand. 
However, it is not clear whether TDG could interact with 
modified adenine residues such as Hx, 8oxoA and eA in 
mismatches with T. Moreover, the capacity of TDG to ex- 
cise eC, 5caC and 5fC without a requirement for opposite 
base (7,23) suggests that the nature of the target base and 
of the specific interactions between the base and protein 
residues within the active site of the enzyme play a primor- 
dial role in the DNA substrate recognition by TDG. Crys- 
tallographic studies of MBD4 in complex with DNA re- 
vealed that, in contrast to TDG, MBD4 has very limited 
DNA substrate preference due to conformation of its ac- 
tive site pocket and due to specific interactions between the 
orphan G and Arg468 that helps to stabilize the flipped-out 
target base for efficient catalysis (46). Still, MBD4 can ex- 
cise with very low efficiency T opposite to eA, but not op- 
posite to Hx or 8oxoA, suggesting that Arg468 may inter- 
act with unpaired eA in enzyme-substrate complex. Over- 
all, the available crystallographic data on DNA glycosy- 
lases: substrate complexes do not provide sufficient insight 
(i) into the structural basis of the strong sequence con- 
text dependence of excision of mismatched T by TDG and 
MBD4 and (ii) for whether the DNA glycosylases are able to 
make specific contacts with the orphan damaged A residue 
during DNA substrate recognition. 

The phenomenon of cooperative DNA binding by repair 
enzymes has been studied previously (24,25). It was pro- 
posed that this mode of binding may protect DNA and 
help to coordinate downstream repair steps (25). Here, we 
demonstrated that long 28-40-mer DNA duplexes are re- 
quired for efficient excision of mismatched T and 8oxoA, 
but not U residues, by full-length TDG (Figure 4 and Sup- 
plementary Figure S3) suggesting that cooperative bind- 
ing of TDG to DNA is required for efficient processing 
of a number of DNA substrates. The molecular mecha- 
nism and structural basis for this molecular recognition 
phenomenon remain unclear. We may propose that forma- 
tion of oligomeric TDG filaments along the length of DNA 
promoter regions enables both the protection from de novo 
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DNA methylation and the efficient repair of mismatched 
thymine residues. 

In vitro reconstitution of the TDG-initiated BER path- 
way of T»Hx duplex and the plasmid mutagenesis stud- 
ies (Figures 5 and 6 and Supplementary Tables S2 and S3) 
demonstrated that the aberrant excision of T by TDG can 
introduce TpG, CpA^CpG mutations in the absence of 
DNA replication. Data obtained in cell-free extracts from 
MEF- WT 3.nd MEF- Tdg~^~ cells indicate that mammalian 
TDG is the main DNA glycosylase involved in the aberrant 
repair of T»Hx and 8oxoA»G duplexes in mouse cells (Fig- 
ure 7). Importantly, in cell-free extracts, ANPG-catalyzed 
Hx and e A removal can efficiently prevents TDG-mediated 
aberrant removal of T in the complementary strand in 5^- 
TpG-375^-CpX-3^ sequence context, however not to abso- 
lute extent (Figure 7A). Therefore, in vivo minor fraction 
of T»Hx and T»eA base pairs in DNA could be repaired 
by TDG in aberrant error-prone manner. Interestingly, the 
nuclear fraction of TDG associates tightly with euchro- 
matin and binds to the CpG-rich promoters of transcribed 
genes including the pluripotency and developmental genes 
(3 1,59). It was suggested that TDG scans CpG sites for mis- 
matches and regulates gene expressions, either by protect- 
ing CpG from de novo DNA methylation or via interac- 
tion with chromatin and transcription factors. Comparison 
of the spontaneous genome-wide mutation spectra versus 
those at the CGIs using human SNP database revealed that 
CGIs, in contrast to whole genome, exhibit strong muta- 
tional bias for TpG, CpA^CpG mutations (Table 3 and 
Supplementary Table S4). The observed mutation spectra 
suggest that the TDG-catalyzed aberrant BER might be in- 
volved in the stabilization and extension of CG content in 
CpG-rich promoters. 

Based on our results, it is tempting to speculate that 
the increased rate of TpG, CpA^CpG mutations in CGIs 
would depend on their transcriptional activity given the as- 
sociation of TDG with the CpG-rich promoter regions to 
protect them from de novo methylation. Furthermore, emer- 
gence of CGIs at TSSs of genes during evolution of warm- 
blooded vertebrates (36) may be a direct consequence of 
the TDG-catalyzed aberrant BER pathway of spontaneous 
and oxidative damage to adenine residues occurred within 
regulatory regions of a genome. Here, we propose a model 
in which the non-methylated regulatory DNA regions as- 
sociated with TDG would be driven toward an increased 
CpG dinucleotide content by conversion of CpA and TpG 
to CpG dinucleotides, whereas the reverse processes would 
occur in methylated genome area due to spontaneous deam- 
ination of 5mC residues (Figure 8). In line with our hypo- 
thetical mechanism, it has been shown that the relative rate 
of CpG-rich promoter evolution is substantially accelerated 
in primates and/or remains neutral in mammals (39). Fur- 
thermore, the presence of multiple 5mC-DNA glycosylases 
in plants that contain mismatch- specific thymine-DNA gly- 
cosylase activity may provide a rational for the increased 
CG content of the plant genomes despite dense methyla- 
tion of cellular DNA (60). Finally, the aberrant BER path- 
way described here could be considered as an evolutionary 
capacitor mechanism that accelerates mutation rate at the 
regulatory DNA sequences which enables environmentally 
induced gene expression to become a developmentally pro- 
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Figure 8. Dynamics of mutational spectra in the mammalian genomes. (A) 
Mammalian DNA methyltransferases (DNMTs) transfer methyl group to 
cytosines at position 5 in CpG dinucleotide contexts resulting in 5mC 
residues in DNA. The spontaneous deamination of 5mC induces the tran- 
sition mutations CpG^TpG and CpG^CpA. (B) TDG localizes in CGIs 
promoter regions to protect them from de novo DNA methylation either in 
passive or active TET-mediated manner. Spontaneous damage to adenines 
in TpG and CpA dinucleotide contexts in non-methylated DNA promote 
the TDG-catalyzed aberrant BER and TpG^CpG and CpA^CpG tran- 
sition mutations. 



grammed expression pattern, which in turn may play a role 
in the formation of the placenta and exponential growth in 
taxonomic diversity of mammals (61). 
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